In the last 30 years, the dating approach has changed and has become increasingly difficult. The willingness to date has decreased, dating is too expensive and time consuming, we have too many (perceived) options to date someone and we struggle because of accepting too easily negative sex stereotypes. In the 19th century, a custom in the United States called New Year’s Calling, was that on New Year’s Day many young, single women would hold an Open House (a party or reception during which a person’s home is open to visitors) on 1 January where they would invite eligible bachelors, both friends and strangers, to stop by for a brief (no more than 10–15-minute) visit. This custom was established with the term SpeedDating as a registered trademark by Aish HaTorah, who began hosting SpeedDating events in 1998.
10 years later, Fisman et al. conducted a survey regarding speed dating habits and collected 8,000 observations during his 2 – year observation in his paper Gender Differences in Mate Selection: Evidence from a Speed Dating Experiment. Because speed dating has become more and more interesting in the last few years and also through Corona a completely new dating approach has emerged, we wanted to analyse this dataset with the following questions in mind:
What are the most effective personal characteristics to achieve a match in opposite sex speed dating?
A match may be a high like value (1 - 10, regression) or a positive match (1 or 0, classification)
The following hypotheses support our research question:
Null hypothesis:
There is no affection of having specific characteristics regarding match selection of the survey participants
There is no correlation between shared interests, attributes and getting a match
Hypotheses:
Survey participants who both have the specific characteristics same race and opposite gender tend to achieve more matches
Survey participants with a higher income tend to achieve more matches than survey participants with a lower income
Achieving matches because of having the same specific characteristics occur in both sexes
Three weeks after the event, males called women more often
Our dataset was pretty helpful in answering this and more questions, as there were a lot of helpful features:
We want to answer our research questions in 4 steps:
Step 1 Importing the required libraries
Step 2 Cleaning the dataset
Step 3 Analyzing the dataset
Step 4 Preparing the model
Step 5 Analyzing the model
The main, effective variables we want to look at to answer our research questions are ‘Match’ (as our predictor variable for the classification) including the personal attributes/features and ‘Like’ for the regression. For all variables, we use descriptive terms in order to recognize them better. First, we want to analyze the importance of each personal attributes for achieving a match (classification) on the one hand and for the strength of a like (regression) on the other hand.
For the calls variable, we assume that NaN had zero calls. This is of cause only an estimation. This value was collected after the events so not many answered this question. For all the other attributes we drop the NaN values because the ratio is rather small.
def plot_missing(df):''' This function shows distribution of missing values for all variables of the dataset in the form of a graph. ''' plt.figure(figsize=(10, 4),dpi=80) sns.heatmap(df.isnull().transpose(), xticklabels=False, cbar=False, cmap='viridis') plt.title('Distribution of Missing Values among Variables', fontsize=20) plt.show()
Overall, there are a lot of missing values for questions that were asked after the events like the *_2 and *_3 attributes, so we concentrated on the answers given at the events.
Show the code
plot_missing(df)
Show the code
fig, ax2=plt.subplots(ncols=2)g = sns.histplot( data=df[features_classification].isna().melt(value_name="NaN"), y="variable", hue="NaN", multiple="fill",ax=ax2[0])g2 = sns.histplot( data=df[features_regression].isna().melt(value_name="NaN"), y="variable", hue="NaN", multiple="fill",ax=ax2[1])ax2[0].set_title("Classification Features NaN Values")ax2[1].set_title("Regression Features NaN Values")plt.legend([],[], frameon=False)fig.tight_layout()
After cleaning our dataset and our initial exploratory data analysis, we can see the relationships between the respective outcome and possible predictors for each of the classification and regression.
For regression we use the following models:
Linear Regression,
Multiple Regression
Lasso Regression
XGBOOST Regression Models
The considered metrics for regression are:
R2-Score
Mean squared error
Mean Absolute Error
Root Mean Squared Error
For classification we use the following models:
Logistic Regression
The considered metrics for logicstic regression are:
Confusion Matrix
Precision, Recall, Accuracy and F1 scores
Precision-Recall curve
ROC curve and AUC value
Model selection process for Classification: Besides logistic regression there are other types of classification algorithms like Naïve Bayes, Stochastic Gradient Descent, K-Nearest Neighbours, Decision Tree, Random Forest and Support Vector Machine. Since we need a machine learning algorithm which is most useful for understanding the influence of several independent variables on our single outcome variable, we use the Logistic Regression, which is modelling the probabilities describing the possible outcomes of a single trial.
For the Logistic Regression, we use the LogisticRegressionCV model. On default, this model includes a 5 cross fold validation with Stratified K-Folds so there is no need to do further training and validation. See the Scikit-Learn documentation
Model selection process for Regression: For the Regression analysis we need a type of predictive modelling which investigates the relationship between a dependent (target) and independent variable(s) (predictor). This technique is used for forecasting, time series modelling and finding the cause-effect relationship between the variables.
Evaluating the model accuracy is an essential part of the process in evaluating the performance of machine learning models to describe how well the model is performing in its predictions. The basic concept of accuracy evaluation is to compare the original target with the predicted one according to certain metrics. We use different models and interpret their values. We start by using linear regression in order to model the relationship between the features and the target variable. Second, we use Lasso regression as a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. Like Ridge regression, Lasso is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination. XGBoost (Extreme Gradient Boosting) is used as an optimized distributed gradient boosting library and for supervised Machine Learning problems. XGBoost belongs to a family of boosting algorithms that convert weak learners into strong learners.
Regression
To get the most important features, we do a SelectKBest Selection with a F_Regression scoring function with all numerical values to get the top 20 highest scoring variables for regression analysis
The F-Regression is an Univariate linear regression test returning F-statistic and p-values.
it is a Quick linear model for testing the effect of a single regressor, sequentially for many regressors. This is done in 2 steps:
The cross correlation between each regressor and the target is computed using r_regression as: \[E[(X[:, i] - mean(X[:, i])) * (y - mean(y))] / (std(X[:, i]) * std(y))\]
It is converted to an F score and then to a p-value. f_regression is derived from r_regression and will rank features in the same order if all the features are positively correlated with the target.
When looking at the Barchart, it can be determined that Attractivity, Humor, Sincerety, Intelligence, Shared Interests, Probability and Ambition are by far the most impactful Features
To make sure that we do not have any multicolinnearity, between the used Variables we checked for it.
As seen in the graph below, where is almost no colinearity besides Intelligence with Sincerity and Ambition or Humor with Shared Interests and Attractivity
Show the code
# Inspect correlation# Calculate correlation using the default method ( "pearson")corr = df_r.corr()# optimize aesthetics: generate mask for removing duplicate / unnecessary infomask = np.zeros_like(corr, dtype=bool)mask[np.triu_indices_from(mask)] =True# Generate a custom diverging colormap as indicator for correlations:cmap = sns.diverging_palette(220, 10, as_cmap=True)# Plotsns.heatmap(corr, mask=mask, cmap=cmap, annot=True, square=True, annot_kws={"size": 12}).set(title="Multicolinearity Check")
[Text(0.5, 1.0, 'Multicolinearity Check')]
Furthermore, all variables have a linear relationship, so Splines or Polinominal regression models are not required.
Show the code
alt.Chart(df_r).mark_bar().encode( alt.X(alt.repeat("column"), type="quantitative", bin=True), y='count()',).properties( width=150, height=150).repeat( column=features_regression).properties(title="Count of record per Feature")
With the most important Features selected, Linear Regression, Lasso Regression and XGBoost Linear Regression were chosen as Models to Train on the Dataset. The following Hyperparameters were determined by a Grid Search for Lasso Alpha: 0.018149939617142067
The following Hyperparameters were determined by a Grid Search for XGBoost (learning_rate= 0.08, n_estimators= 700,booster=‘gblinear’)
Show the code
reg = LinearRegression()reg.fit(X_r_train, y_r_train)lasso = LassoCV(cv=5,random_state=0,max_iter=15000)lasso.fit(X_r_train,y_r_train)# Set best alphalasso_best = Lasso(alpha=lasso.alpha_)lasso_best.fit(X_r_train, y_r_train)regxg=xg.XGBRegressor(learning_rate=0.08, n_estimators=700,booster='gblinear')regxg.fit(X_r_train, y_r_train)regxgnolin=xg.XGBRegressor(learning_rate=0.015, max_depth=2, n_estimators=700)regxgnolin.fit(X_r_train, y_r_train)y_r__reg_pred=reg.predict(X_r_test)y_r__lasso_pred=lasso_best.predict(X_r_test)y_r__xgboost_pred=regxg.predict(X_r_test)
The following formulas are used to determine the MAE, RMSE, Mse and R2 Score:
The first impression of the data is that all attributes are important. It’s the best to be rated around 7 - 8 to get a match.
We can also see that the data is very unbalanced, where only 1/5 of the dataset is marked as match while the other 4/5 is no match. This may influence the model.
df_train_c = pd.DataFrame(X_train_c).copy()df_train_c[y_label_c] = pd.DataFrame(y_train_c)c1 = alt.Chart(df_train_c).mark_circle().encode( alt.X(alt.repeat("column"), type='quantitative'), alt.Y(y_label_c), alt.Size('count()'), alt.Color(y_label_c), tooltip=['count()']).properties( width=250, height=150).repeat( column=personal_features).interactive().properties(title="Attribute importance to get a match")c2 = alt.Chart(df_train_c).mark_bar().encode( alt.X(y_label_c), alt.Y('count()'), alt.Color(y_label_c), tooltip = ['count()']).properties( width=250, height=150).interactive().properties(title="Match and no-match comparison")c1 & c2
When we are looking at those charts we search for differences between the match and no match results.
For humor, shared interests, attractivity and probability we can see again that the charts for match start to raise at around five, with a peak at 7-8. The importance for same race falls stronger for a match than for no match.
For the other attributes, the charts look pretty similar just with lower amplitudes.
We can conclude that the mentioned attributes are probably important for the model in contrast to the others.
Show the code
features=personal_features+num_featuresfig, ax=plt.subplots(ncols=len(features))for count, value inenumerate(features): sns.kdeplot(data=df,x=value,hue="Match",ax=ax[count],fill=True)fig.set_size_inches(40,5)fig.suptitle("Classification Features")fig.tight_layout()
Model selection
The most important coefficients for a positive correlation for our model is Attractivity, followed by Humor, Interests_correlation and Probability. We also have strong negative correlations with Gender and Ambition.
Based on the metrics our model performs poor, predicting a lot of no matches. The R1 score for a match is very low (0.24). There are only 33 cases where we do a correct prediction of the match outcome.
This may be based on the origin data where we have a lot more no-match entries than matches.
Show the code
def metrics_confusionmatrix(log_reg, X_test_c, y_test_c):''' This function predicts y and prints confusion matrix and classifiaction report. Returns y_pred_c and y_proba_c. ''' y_pred_c = log_reg.predict(X_test_c) y_proba_c = log_reg.predict_proba(X_test_c)[:, 1] fig, ax = plt.subplots(figsize=(3, 3)) ConfusionMatrixDisplay.from_estimator(log_reg, X_test_c, y_test_c, ax=ax, colorbar=False)print(classification_report(y_test_c, y_pred_c, target_names=['No match', 'Match']))return y_pred_c, y_proba_c
Show the code
def metrics_calculator(y_test, y_pred, model_name):''' This function calculates all desired performance metrics for a given model. ''' result = pd.DataFrame(data=[accuracy_score(y_test, y_pred), precision_score(y_test, y_pred, average='macro'), recall_score(y_test, y_pred, average='macro'), f1_score(y_test, y_pred, average='macro')], index=['Accuracy','Precision','Recall','F1-score'], columns = [model_name])return result
Show the code
def metrics_precisionrecall(log_reg, X_test_c, y_test_c, model_name):''' This function displays the PrecisionRecallDisplay ''' fig, ax = plt.subplots(figsize=(5, 3)) PrecisionRecallDisplay.from_estimator(log_reg, X_test_c, y_test_c, name=model_name, ax=ax)
Show the code
def metrics_roc(log_reg, X_test_c, y_test_c, y_proba_c, model_name):''' This function displays the RocCurveDisplay and roc_auc_score ''' fig, ax = plt.subplots(figsize=(5, 3)) RocCurveDisplay.from_estimator(log_reg, X_test_c, y_test_c, name=model_name, ax=ax) fpr, tpr, thresholds = roc_curve(y_test_c, y_proba_c)print(f"The AUC score is: {roc_auc_score(y_test_c, y_proba_c)}")return fpr, tpr
The F1-Score is calculated by: 2 * Precision * Recall / Precision + Recall. It takes the harmonic mean of precision and recall into account when optimizing the model. Values closer to 1 indicate a better performance.
The Accuracy is calculated by: Number of correct predictions / Total number of predictions. It’s the portion of correct predictions.
Because our model does a lot of correct (true negative) predictions, the Accuracy score is high.
The Receiver Operating Characteristic (ROC) Curve summarizes the trade-off between the true positive rate and false positive rate.
ROC curves are appropriate when the observations are balanced between each class, whereas precision-recall curves are appropriate for imbalanced datasets.
Therefore the ROC curve looks good, although our model is in fact bad. This is caused by the high true negative rate in our model that is taken into account in the ROC but not in the Precision-Recall metric.
We can try to even the numbers and train the model again.
Show the code
df_new = pd.concat([df_c[df_c[y_label_c] ==0][:1000], df_c[df_c[y_label_c] ==1][:1000]])alt.Chart(df_new).mark_bar().encode( alt.X(y_label_c), alt.Y('count()'), alt.Color(y_label_c), tooltip = ['count()']).properties( width=250, height=150).interactive().properties(title="1000 match and no-match data")
Result
Our model looks a lot better now with a F1 score of 0.75. We do a correct estimation of a match in 152 of the cases and a correct estimation of no-match in 148 so in 3/4 of all cases we are correct and the Accuracy is at 75%.
Based on the dating behaviour it may be better to maximise the precision of match (have a lot of dates but less matches) or recall (have less dates but more matches). Because a match is still very personal, it is probably better to tune for precision.
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
precision recall f1-score support
No match 0.75 0.77 0.76 199
Match 0.77 0.75 0.76 201
accuracy 0.76 400
macro avg 0.76 0.76 0.76 400
weighted avg 0.76 0.76 0.76 400
Comparing the models, there is no big difference between the first model with a dataset of 2000 observations and the tuned one but balancing the dataset had a great effect on the model.
final_scores_t = final_scores.Tfinal_scores_t['Name'] = final_scores.columnsfinal_scores_taccuracy = alt.Chart(final_scores_t).mark_bar().encode( y='Name', x='Accuracy:Q', color='Name', tooltip='Accuracy:Q')a_text = accuracy.mark_text( align='center', baseline='middle', dx=45# Nudges text to right so it doesn't appear on top of the bar).encode( text='Accuracy:Q')precision = alt.Chart(final_scores_t).mark_bar().encode( y=('Name'), x=('Precision:Q'), color='Name', tooltip='Precision:Q')p_text = precision.mark_text( align='center', baseline='middle', dx=45# Nudges text to right so it doesn't appear on top of the bar).encode( text='Precision:Q')recall = alt.Chart(final_scores_t).mark_bar().encode( y=('Name'), x=('Recall:Q'), color='Name', tooltip='Recall:Q')r_text = recall.mark_text( align='center', baseline='middle', dx=45# Nudges text to right so it doesn't appear on top of the bar).encode( text='Recall:Q')f1 = alt.Chart(final_scores_t).mark_bar().encode( y=('Name'), x=('F1-score:Q'), color='Name', tooltip='F1-score:Q')f1_text = f1.mark_text( align='center', baseline='middle', dx=45# Nudges text to right so it doesn't appear on top of the bar).encode( text='F1-score:Q')(precision + p_text) & (recall + r_text) & (accuracy + a_text) & (f1 + f1_text)
With the data from this survey and our used statistical methods and classification, we could answer the following research questions:
What are the most effective personal characteristics to achieve a match in opposite sex speed dating?
According to our model parameters the most important features for a positive correlation are: attractivity, same interests, humor and that the other person also shows some interest. On the other hand there is a negative correlation for beeing the same gender and race and being (maybe too) ambitious and sincere.
Null hypothesis:
There is no affection of having specific characteristics regarding match selection of the survey participants
This does not hold true, the characteristics are described above.
There is no correlation between shared interests, attributes and getting a match
This does not hold true, the characteristics are described above.
Hypotheses:
Survey participants who both have the specific characteristics same race and opposite gender tend to achieve more matches
We didn’t investigate that in detail, but we saw a rather negative correlation between same race and match.
Survey participants with a higher income tend to achieve more matches than survey participants with a lower income
We didn’t investigate that in detail, as the income wasn’t an important feature for our model.
Achieving matches because of having the same specific characteristics occur in both sexes
Yes, this hypotheses is true.
Three weeks after the event, males called women more often
Yes, by the factor of four. See appendix.
With our report, we contribute to a better understanding of the topic of speed dating and the preferences of the participants. Our paper serves as an important starting point in understanding the preferences underlying the search for a partner. Prior work has shown how to achieve matches, but in this report we compare these needed features and give an example which attributes a speed dating participant need to have in order to achieve matches and likes. In this report, we use an explorative data analysis approach that allows us to directly observe individual decisions.
There are a number of ways that our work may be improved. Due to the limitation of the data collection method - a local survey in only one country, we have a very specific distribution of races throughout the speed dating participants. Also, in terms of the validity of our dataset, gender politics have changed since 2008, and we have largely ignored gender diversity and focused only on men and women, altough those two genders don’t really show a significant difference in the data. Most notably, a similar methodology could be employed on a newer set of data, because our data set is more than 10 years old.
Appendix
Data Dictionary
Descriptive terms for our used variables
Name
Description
Descriptive term
calls
Event of a participant conducting a “you_call” or “them_cal” with the other party
Calls of participants
attr
Rating of the attribute for this person from 1 - 10.
Attractivity of speed dating participant
sinc
Rating of the attribute for this person from 1 - 10.
Sincerety of speed dating participant
intel
Rating of the attribute for this person from 1 - 10.
Intelligence of speed dating participant
fun
Rating of the attribute for this person from 1 - 10.
Humor of speed dating participant
amb
Rating of the attribute for this person from 1 - 10.
Ambition of speed dating participant
shar
Rating of the attribute for this person from 1 - 10.
Shared Interests/Hobbies of the speed dating participant to the other party
like
Overall, how much do oyu like this person. 1 (don’t like at all) to 10 (like a lot)
Strength of like of speed dating participant to the other party
prob
How probable do you think it is that this person will say ‘yes’ for you? 1 (not probable) to 10 (extemely probable)
Probability of speed dating participant to like the other party
met
Have you met this person before? (1 = yes, 2 = no)
Meeting indicator of participants
gender
Gender of the person. Female = 0, Male = 1
Gender of speed dating participant
order
The number of date that night when met partner
Order of date of speed dating participant and the other party during event
match
1 = yes, 0 = no
Match of the speed dating participant and the other party
int_corr
Correlation between participant’s and partner’s ratings of interests in Time 1
Correlation of the speed dating participant and the other party
samerace
Participant and the partner were the same race. 1 = yes, 0 = no
Indicates, if the speed dating participant and the other party have the same race
age
Age of the person
Age of speed dating participant
age_o
Age of partner
Age of other party
race
Race of the attendee 1 = Black/African American 2 = European/Caucasian-American 3 = Latino/Hispanic American 4 = Asian/Pacific Islander/Asian-American 5 = Native American 6 = Other
Race of speed dating participant
race_o
Race of partner
Race of other party
imprace
How important is it that a person you date be of the same racial/ethic background? (1 - 10)
Importance of the other party having the same race as the speed dating participant
intel_o
Intelligent. Rating by partner the night of the event from 1 (awful) to 10 (great)
Intelligence of the other party
sinc_o
Sincere. Rating by partner the night of the event from 1 (awful) to 10 (great)
Sincerety of the other party
like_o
Overall, how much do oyu like this person. 1 (don’t like at all) to 10 (like a lot)
Strength of like of to the other party
prob_o
How probable do you think it is that this person will say ‘yes’ for you? 1 (not probable) to 10 (extemely probable)
Probability of the other party to like speed dating participant
fun_o
Fun. Rating by partner the night of the event from 1 (awful) to 10 (great)
Humor of the other party
satis_2
Generic Id
Generic Id
amb_o
Ambitious. Rating by partner the night of the event from 1 (awful) to 10 (great)
Ambition of the other party
shar_o
Shared Interests/Hobbies. Rating by partner the night of the event from 1 (awful) to 10 (great)
Shared Interests/Hobbies of the other party to speed dating participant
attr_o
Attractive. Rating by partner the night of the event from 1 (awful) to 10 (great)
Attractivity of the other party
met_o
Have you met this person before? (1 = yes, 2 = no)
Meeting indicator of the other party
exphappy
Overall, on a scale of 1-10, how happy do you expect to be with the people you meet during the speed-dating event?
Expected Happiness of meeting people
pid
partner’s iid number
partner’s iid number
Tuning
The lower the thresholds, the more false positives we have.
The higher the thresholds, the more false negatives we have. If we want to tune it very hard, we can predict 22 partners and 21 of them would be a real match.
This would make sense if we want to make predictions for a person with whom the person should go on a date.
We can see that a lot more male (2.422) are calling female than the other way round (681). On the other hand, both sexes said that they have been called more often than there were actual calls (male 1.035/681 and female 2.866/2.422), maybe there is some bias about these numbers or the data is incomplete.